438 research outputs found
Action Recognition in Video Using Sparse Coding and Relative Features
This work presents an approach to category-based action recognition in video
using sparse coding techniques. The proposed approach includes two main
contributions: i) A new method to handle intra-class variations by decomposing
each video into a reduced set of representative atomic action acts or
key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational
Act Descriptor, that exploits the power of comparative reasoning to capture
relative similarity relations among key-sequences. In terms of the method to
obtain key-sequences, we introduce a loss function that, for each video, leads
to the identification of a sparse set of representative key-frames capturing
both, relevant particularities arising in the input video, as well as relevant
generalities arising in the complete class collection. In terms of the method
to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative
intra and inter-class similarities among local temporal patterns arising in the
videos. The resulting ITRA descriptor demonstrates to be highly effective to
discriminate among action categories. As a result, the proposed approach
reaches remarkable action recognition performance on several popular benchmark
datasets, outperforming alternative state-of-the-art techniques by a large
margin.Comment: Accepted to CVPR 201
How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
The knowledge representation community has built general-purpose ontologies
which contain large amounts of commonsense knowledge over relevant aspects of
the world, including useful visual information, e.g.: "a ball is used by a
football player", "a tennis player is located at a tennis court". Current
state-of-the-art approaches for visual recognition do not exploit these
rule-based knowledge sources. Instead, they learn recognition models directly
from training examples. In this paper, we study how general-purpose
ontologies---specifically, MIT's ConceptNet ontology---can improve the
performance of state-of-the-art vision systems. As a testbed, we tackle the
problem of sentence-based image retrieval. Our retrieval approach incorporates
knowledge from ConceptNet on top of a large pool of object detectors derived
from a deep learning technique. In our experiments, we show that ConceptNet can
improve performance on a common benchmark dataset. Key to our performance is
the use of the ESPGAME dataset to select visually relevant relations from
ConceptNet. Consequently, a main conclusion of this work is that
general-purpose commonsense ontologies improve performance on visual reasoning
tasks when properly filtered to select meaningful visual relations.Comment: Accepted in IJCAI-1
Comparing Neural and Attractiveness-based Visual Features for Artwork Recommendation
Advances in image processing and computer vision in the latest years have
brought about the use of visual features in artwork recommendation. Recent
works have shown that visual features obtained from pre-trained deep neural
networks (DNNs) perform very well for recommending digital art. Other recent
works have shown that explicit visual features (EVF) based on attractiveness
can perform well in preference prediction tasks, but no previous work has
compared DNN features versus specific attractiveness-based visual features
(e.g. brightness, texture) in terms of recommendation performance. In this
work, we study and compare the performance of DNN and EVF features for the
purpose of physical artwork recommendation using transactional data from
UGallery, an online store of physical paintings. In addition, we perform an
exploratory analysis to understand if DNN embedded features have some relation
with certain EVF. Our results show that DNN features outperform EVF, that
certain EVF features are more suited for physical artwork recommendation and,
finally, we show evidence that certain neurons in the DNN might be partially
encoding visual features such as brightness, providing an opportunity for
explaining recommendations based on visual neural models.Comment: DLRS 2017 workshop, co-located at RecSys 201
- …